Decontextualized learning for interpretable hierarchical representations of visual patterns

نویسندگان

چکیده

•We overcome a major limitation in using deep neural networks on small sample sizes•Our approach achieves state-of-the-art interpretability scores of complex features•The presented framework integrates analytical, virtual, and experimental approaches We present fully featured to studying natural images that approaches. Our framework, decontextualized hierarchical representation learning (DHRL), overcomes the limitations datasets typical studies sciences, enabling application unsupervised models questions where data are much more limited. DHRL captures features improved latent variable interpretation techniques. The provided can be used perform range virtual experiments, transforming way we study color patterns removing necessity for less explicit, sometimes ethically problematic, Apart from discriminative modeling, convolutional basic research utilizing imaging faces unique hurdles. Here, designed specifically these limitations. enables broader use datasets, which most studies. It also spatial relationships between features, provides novel tools investigating variables, disentanglement datasets. is enabled by preprocessing technique inspired generative model chaining an ladder network architecture regularization scheme. More than analytical tool, capabilities experiments performed directly representation, may transform investigations image integrating empirical, theoretical A key motivation expanded (CNNs)1LeCun Y. Kavukcuoglu K. Farabet C. Convolutional applications vision.in: Proceedings 2010 IEEE International Symposium Circuits Systems. IEEE, 2010: 253-256Crossref Scopus (1142) Google Scholar lies their capacity outperform classical computer vision tasks.2Everingham M. Van Gool L. Williams C.K. Winn J. Zisserman A. Pascal Visual Object Classes (VOC) challenge.Int. Comput. Vis. 2010; 88: 303-338Crossref (7400) In life researchers leveraging CNNs broad domain-specific applications, such as automated tracking animal movement,3Bozek Hebert Mikheyev A.S. Stephens G.J. Towards dense object 2D honeybee hive.in: Conference Computer Vision Pattern Recognition. 2018: 4185-4193Crossref (13) Scholar, 4Mathis Mamidanna P. Cury K.M. Abe T. Murthy V.N. Mathis M.W. Bethge DeepLabCut: markerless pose estimation user-defined body parts with learning.Nat. Neurosci. 2018; 21: 1281-1289Crossref PubMed (506) 5Pereira T.D. Aldarondo D.E. Willmore Kislin Wang S.S.-H. Shaevitz J.W. Fast networks.Nat. Methods. 2019; 16: 117-125Crossref (120) detection classification cell lines,6McQuin Goodman Chernyshev V. Kamentsky Cimini B.A. Karhohs K.W. Doan Ding Rafelski S.M. Thirstrup D. et al.CellProfiler 3.0: Next-generation processing biology.PLoS Biol. e2005970Crossref (405) 7Falk Mai Bensch R. Çiçek Ö. Abdulkadir Marrakchi Böhm Deubner Jäckel Z. Seiwald al.U-Net: counting, detection, morphometry.Nat. 67-70Crossref (395) 8Riba Schoendube Zimmermann S. Koltay Zengerle Single-cell dispensing 'real-time' higher efficiency single-cell cloning.Sci. Rep. 2020; 10: 1-9Crossref (4) mining genomics data.9Poplin Chang P.-C. Alexander Schwartz Colthurst Ku Newburger Dijamco Nguyen N. Afshar P.T. al.A universal SNP small-indel variant caller Biotechnol. 36: 983-987Crossref (167) ability represent algorithmically useful (expressivity, Box 1) underlies success networks. This capture feature complexity, unparalleled inv traditional approaches, would suggest usefulness across pathways. Nonetheless, descriptions has been limited due low numbers training samples available difficulties interpreting outputs. As such, built algorithms10Hough P.V. Method Means Recognizing Complex Patterns. Patents, 1962Google 11Harris C.G. others Alvey Conference. Citeseer), 1988: 10-5244Google 12Lowe D.G. recognition local scale-invariant features.Proceedings Seventh Vision. 2. 1999: 1150-1157Google 13Lowe Distinctive keypoints.Int. 2004; 60: 91-110Crossref (39925) continue dominate despite comparatively diminished capacity.Box 1Key terms definitions contextAmortized inference: efficient approximation maximum likelihood training, mapping distributions. variational autoencoders (VAEs), amortized inference inference/encoder network.Decontextualized learning: "decontextualized learning" term borrowed psychology concerned language children, new word learned away here-and-now context. this analogy here describe process breaking contexts within generated "decontextualized" our part proposed procedure (see Decontextualized generation, under Experimental procedures).Disentanglement: degree independent factors variation represented variables code. meta-prior importance building into contrasts entangled multiple single variable.Explaining (in models): units at lower layers become coupled those layers, sampling one unit must cause change how all other update state. VAEs, results depend some, or even all, variables.Expressivity (also, capacity): relative amount complexity functional relationship inputs outputs) captured approach. For example, have expressivity linear transformations singular value decomposition. viewed code parameterized (or less) model. depth proxy expressivity.Hierarchical features: created combination lower-level increasing scale terminal set atomic (at lowest scale). codes wherein produced each prior combined next, increasingly expressive, models. then create higher-level Variational autoencoder, procedures).Information preference problem: when (decoder) learns generate outputs high without reference result uninformative code, like noise vector modeling adversarial networks.Mean-field assumption (of pixel-wise comparison): unlike feedforward networks, typically require only preservation realistic output do not preserve translation invariance, contexts, surrounding features.Meta-priors: general (not task specific) assumptions about organized algorithms, enforced during training. Examples include sparsity, temporal coherence, presence manifolds high-dimensional space 2 approach).Mutual information: metric describing information held (the code) informs us another (sample data). Amortized network. procedures). Disentanglement: variable. Explaining variables. Expressivity expressivity. Hierarchical Information Mean-field features. Meta-priors: approach). Mutual Here address hurdles applying expressive outside tasks, providing highly extensible integrated doing this, identify functionalities framework; it should: (1) provide disentangles along interpretable axes; (2) (Box 1); (3) incorporate existing knowledge available; allow statistical traits; (5) direct connections (i.e., should experimental, approaches). contrast models, seeks find unknown offers alternative compression, clustering, extraction Generative techniques, i.e., (GANs)14Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, (2014)..Generative arXiv:1406.2661Google (VAEs),15Kingma D.P. Welling Auto-encoding Bayes.in: Bengio LeCun 2nd Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, Track Proceedings. arXiv, 2014Google Scholar,16Rezende D.J. Mohamed Wierstra Xing E.P. Jebara 31st Machine Learning. PMLR, 2014: 1278-1286Google especially effective representing generating photorealistic examples. addition increased stacks VAEs offer intuitive analysis. An extension inference, combine model, encoder, performs approximate posterior distribution over low-dimensional (qϕ(z|x)). decoder, conditioned (pθ(x|z)). Instead optimizing specific task, objective function take variety forms, maximize (e.g., reconstruction error) minimize divergence prior. means evaluate likelihoods, estimate distributions, samples. Despite qualities, however, themselves make strong basis several outstanding issues limit development application; include: problem, restrictive mean-field error metrics, explaining entanglement 2). Although shortcomings (Table 1), they unified They better meaningful extensions Another lingering concern apply modestly sized Whereas many techniques developed large CelebA17Liu Luo X. Tang Deep face attrributes wild.in: ICCV, 2015: 3730-3738Google (>200,000 samples) dSprites18Matthey Higgins I. Hassabis Lerchner dSprites: Disentanglement Testing Sprites Dataset.2017https://github.com/deepmind/dsprites-dataset/Google (>700,000 samples), sizes sciences orders magnitude smaller.Box 2Combining VAE shortfallsDespite promise, (VAEs) applications. These 1 definitions). varies, but recent work potential solutions issues. changes used19Higgins Matthey Pal Burgess Glorot Botvinick 5th 2017, Toulon, France, 24-26, 2017Google Scholar,20Zhao Song Ermon InfoVAE: balancing autoencoders.33. AAAI Press, 2019: 5885-5892Google Scholar,21Chen T.Q. Li Grosse R.B. Duvenaud D.K. Isolating sources autoencoders.Adv. Neural Inf. Process. Syst. 31: 2610-2620Google specialized architectures.22Zhao models.in: 34th Volume 70. JMLR. org, 2017: 4091-4099Google Scholar,23Sønderby Raiko Maaløe Sønderby S.K. Winther O. Ladder 2016; 29: 3738-3746Google Outside there measuring large-scale perceptual distances,24Johnson Alahi Fei-Fei Perceptual losses real-time style transfer super-resolution.European 2016Crossref (2671) Scholar,25Li Liu Hou Sierra Twenty-Sixth Joint Artificial Intelligence, IJCAI Melbourne, Australia, August 19-25, 2017. ijcai.org, 2230-2236Google decreasing dependence features,26Im D.I.J. Ahn Memisevic Denoising criterion auto-encoding framework.in: Thirty-First Intelligence. 2059-2065Google capturing components features,27Hu Shen Sun G. Squeeze-and-excitation networks.in: 7132-7141Crossref (4487) applied context modeling.Here, contributions robust modeling. (VLAE), proposes mitigate problem encourage encoding based scale).22Zhao Using VLAEs, How measure distributions lead trade-off fit choose information-preserving mutual information, technique: mean discrepancy (MMD).28Gretton Borgwardt Rasch Schölkopf B. Smola A.J. kernel method two-sample-problem.Adv. 2007; 19: 513-520Google Scholar,29Gretton Sriperumbudur B.K. Sejdinovic Strathmann H. Balakrishnan Pontil Fukumizu Optimal choice two-sample tests.Adv. 2012; 25: 1205-1213Google Kullback-Leibler (KL) commonly VAEs), MMD does suffer variance (overfitting) code.20Zhao KL divergence, makes less-restrictive independence Previous focused adjustments β-VAEs);19Higgins come overhead additional hyperparameters annealing).30Fu Gao Celikyilmaz Carin Cyclical annealing schedule: simple mitigating vanishing.in: 2019 North American Chapter Association Computational Linguistics: Human Language Technologies. 1. Linguistics, 240-250Google Finally, while VLAEs less-restrictive, loss still undermine emphasizing losses). balance effect additional, function24Johnson calculated (note entirely poor quality). Commonly transfer,31Gatys Ecker Algorithm Artistic Style.J. 16 (https://doi.org/10.1167/16.12.326): 326Crossref Scholar,32Gatys L.A. Texture synthesis 28th Processing MIT 262-270Google functions effects abstract measures visual similarity scales. together powerful encoder decoder details procedures) gives modern patterns.Table 1Desired characteristics integrative tool data, meta-priors, previously enforcement strategiesDesired characteristicRepresentation meta-prior33Bengio Courville Vincent Representation review perspectives.IEEE Trans. Anal. Mach. Intell. 2013; 35: 1798-1828Crossref (5340) ScholarExample approachDisentangling variationLimited number shared variationLatent regularization19Higgins ScholarCapturing relationshipsHierarchical organization representationHierarchical architecture22Zhao ScholarIncorporating domain knowledgeLocal manifoldsStructured codes34Chen Duan Houthooft Schulman Sutskever Abbeel InfoGAN: maximizing nets.Adv. 2172-2180Google ScholarConnect analyses experimentsLocal manifoldsGenerative models14Goodfellow, 15Kingma 16Rezende ScholarInferenceProbability mass manifoldsVariational inference15Kingma Open table tab X

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Visual Patterns from Deep Learning Representations

Vector-space word representations based on neural network models can include linguistic regularities, enabling semantic operations based on vector arithmetic. In this paper, we explore an analogous approach applied to images. We define a methodology to obtain large and sparse vectors from individual images and image classes, by using a pre-trained model of the GoogLeNet architecture. We evaluat...

متن کامل

Learning interpretable representations of biological data

The increasing ease of collecting genome-scale data has rapidly accelerated its use in all areas of biomedical science. Translating genome scale data in to testable hypothesis, on the other hand, is challenging and remains an active area method development. In this talk we present two machine learning approaches to deduce data representations that are inspired by a mechanistic understanding of ...

متن کامل

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, bui...

متن کامل

Interpretable Low-rank Document Representations with Label-dependent Sparsity Patterns

Abstract. In context of document classification, where in a corpus of documents their label tags are readily known, an opportunity lies in utilizing label information to learn document representation spaces with better discriminative properties. To this end, in this paper application of a Variational Bayesian Supervised Nonnegative Matrix Factorization (supervised vbNMF) with label-driven spars...

متن کامل

Image representations for visual learning.

Computer vision researchers are developing new approaches to object recognition and detection that are based almost directly on images and avoid the use of intermediate three-dimensional models. Many of these techniques depend on a representation of images that induce a linear vector space structure and in principle requires dense feature correspondence. This image representation allows the use...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Patterns

سال: 2021

ISSN: ['2666-3899']

DOI: https://doi.org/10.1016/j.patter.2020.100193